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(54) Text searching system 

(57) The invention relates to a text searching sys- 
tem (30,60) for searching web pages according to key- 
word and classification data (46,64) provided by a user. 
The text searching system (30,60) comprises a compu- 
ter having a memory (32) for storing programs and data 
and a processor (34) for executing the programs stored 
in the memory (32), a text data file (36) stored in the 
memory (32) having text data (38) of web pages of a 
plurality of world wide web sites, a text index file (40) 
stored in the memory (32) having keyword searching 
data (42) for searching keywords contained in the text 
data (38) of each of the web pages of the text data file 
(36), a classification index file (44,62) stored in the 
memory (32) having classification data (46,64) corre- 
sponding to the classification (54) of each of the web 
pages of the text data file (36), and a searching program 
(48,66) stored in the computer for searching the text 
index file (40) and the classification Index file (44,62) 
according to keyword and classification data (46,64) 
provided by a user so as to find text data (38) which are 
matched.with the user provided keyword data and con- 
tained in a plurality of target web pages whose classifi- 
cations (54) are matched with the user provided 
classification data (46,64) in the text data file (36). 
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Description 

[0001 ] The invention relates to a text searching sys- 
tem according to the pre-characterizing portion of claim 
1. 

[0002] As the number of web pages on the internet 
increases, a searching system becomes necessary for 
searching the myriad of web pages for specific informa- 
tion. A corresponding prior art searching system com- 
prises a computer having a memory in which a text data 
file, a text index file comprising keyword searching data, 
and a searching program are stored. Since the system 
uses a keyword for searching web pages, the text data 
of all the web pages containing the keyword are 
returned. Whereas this requires an excessive amount of 
transmission time, most of the transmitted web pages 
do not fit well into the classification provided by the user. 
Therefore, additional search and transmission time 
must be spent. For example, if the user wants to search 
for web pages of movies containing references to "tor- 
nado", the searching system will transmit to the user the 
text data of all web pages containing the word "tor- 
nado". However, these transmitted web pages will 
include irrelevant pages concerning unrelated topics 
such as meteorology, history and news. Therefore, 
more time must be spent manually selecting the pages 
that are actually pertinent. 

[0003] With these problems in mind, the present 
invention aims at providing a text searching system for 
searching web pages according to a keyword which, 
nevertheless, is more economic either for transmission 
time and consecutive manual selection. 
[0004] This is achieved by the present invention as 
claimed in claim 1 . The dependent claims define advan- 
V tageous further developments of the respective inven- 
tion. 

[0005] In that, according to the invention, the sys- 
tem additionally includes a classification index file hav- 
ing classification data, and a searching program for 
searching text data matching with user provided key- 
word data and user provided classification data, the 
search can be performed in a much more defined man- 
ner, thus to avoid outputting of misrelated pages. 
[0006] In the fallowing the invention is described in 
more detail, having reference to the accompanying 
drawings, in which 

Fig. 1 is a functional block diagram of a prior art 
searching system as mentioned above, 
Fig. 2 is a perspective diagram of the keyword 
searching data in the system of Fig. 1 , 
Fig. 3 is a functional block diagram of a text search- 
ing system according to the present invention, and 
Rg. 4 is a perspective diagram of another text 
searching system according to the present inven- 
tion. 

[0007] The prior art text searching system 10 
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shown in Flg.1 comprises a computer (not shown), a 
text data file 1 6, a text index file 20, and a searching pro- 
gram 24. The computer comprises a memory 12 for 
storing programs and data and a processor 14 for exe- 

5 cutlng the programs stored in the memory 12. The text 
data file 1 6, text index file 20, and searching program 24 
are stored in the memory 12. The text data file 16 has 
text data 18 of web pages of a plurality of world wide 
web sites. The text index file 20 has keyword searching 

w data 22 for searching keywords contained in the text 
data 1 8 of each of the web pages of the text data file 1 6. 
The searching program 24 is used for searching the text 
index file 20 according to keyword data provided by a 
user so as to find text data 18 of all the web pages hav- 

15 ing the user provided keyword data in the text data file 
16. 

[0008] As can be seen in Fig. 2, the keyword 
searching data 22 of the text index file 20 are built 
according to the text data 18 of the text data file 16. 

20 Each set of keyword searching data 22 has a keyword 
21 and address data 23 of the keyword 21 in all web 
pages. As shown in Rg. 2, the address data of the key- 
word "world" in all web pages are a1, a2, a3...; the 
address data of the keyword "world wide web" in all web 

25 pages are c1 , c2, c3.... When the user inputs a keyword, 
the searching program 24 searches the text index file 20 
according to the keyword provided by the user to find 
the keyword searching data 22 corresponding to the 
keyword for getting the address data of the keyword in 

30 all web pages. Finally, the text data file 16 is used for 
transmitting to the user the text data 1 8 of ail web pages 
having the keyword. 

[0009] As mentioned before, because the prior art 
searching system 1 0 uses a keyword for searching web 

35 pages, the text data of all web pages containing the key- 
word are returned. This takes an excessive amount of 
time to transmit. In searching for the web pages within a 
specific classification, the searching system 10 trans- 
mits the text data of all the web pages containing the 

40 keyword to the user but most of the transmitted web 
pages are not well matched with the user provided clas- 
sification. Therefore, more search and transmission 
time must be spent and, nevertheless, finally the pages 
actually pertinent have to be selected manually. 

45 [0010] A text searching system 30 according to the 
present invention, as this is shown in Rg. 3, comprises 
a computer (not shown), a text data file 36, a text Index 
file 40, a classification index file 44, and a searching 
program 48. The computer comprises a memory 32 for 

so storing programs and data and a processor 34 for exe- 
cuting the programs stored in the memory 32. The text 
data file 36, text index file 40, classification index file 44 
and searching program 48 are stored in the memory 32. 
The text data file 36 has text data 38 of web pages of a 

55 plurality of world wide web sites. The text index file 40 
has keyword searching data 42 for searching keywords 
contained in the text data 38 of each of the web pages 
of the text data file 36. The classification index file 44 
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has classification data 46 corresponding to the classifi- 
cation of each of the web pages of the text data file 36. 
The searching program 48 Is used for searching the text 
index file 40 and the classification Index file 44 accord- 
ing to keyword and classification data provided by a 5 
user so as to find text data 38 which are matched with 
the user provided keyword data and contained in a plu- 
rality of target web pages whose classifications are 
matched with the user provided classification data in the 
text data file 36. 10 
[0011] The keyword searching data 42 of the text 
index file 40 is built according to the text data 38 of the 
text data file 36. Each keyword searching data 42 has a 
keyword and address data of the keyword in all web 
pages. Each classification data 46 of the classification is 
index file 44 has a plurality of classifications 54, and 
each classification 54 has web page data 50 of ail the 
web pages belonging to the classification. Each web 
page data 50 comprises a keyword position indexing 
data 52 of the web page. The keyword position indexing 20 
data 52 is used for pointing to the positions of the key- 
word searching data 42 of the specific web page con- 
tained in the text index file 40. 

[0012] When a user inputs keyword and classifica- 
tion data, the searching program 48 searches the clas- 25 
sification index file 44 according to the classification 
data provided to find the web page data 50 of all web 
pages belonging to the classification data. Then, the 
searching program 48 searches the position of the key- 
word searching data 42 of the text data 38 of each web 30 
page in the text index file 40 according to the keyword 
position indexing data 52 of the web page data 50. 
Then, the searching program 48 searches the keyword 
searching data 42 of all web pages belonging to the 
classification data in the text index file 40 according to 35 
the keyword provided by the user to find the text data 38 
of all web pages which belong to the classification data ' 
and have the keyword. Finally, the text data file 36 is 
used for transmitting the text data 38 of all web pages 
belonging to the classification data and having the key- 40 
word to the user. 

[0013] Fig.4 is a perspective diagram of another 
text searching system 60 according to the present 
invention. The classification index file 62 of the text 
searching system 60 contains the classification data 64 45 
of the web pages of each keyword searching data 42 in 
the text index file 40. When a user inputs keyword and 
classification data, the searching program 66 searches 
the text index file 40 according to the keyword provided 
to find all the keyword searching data 42 matched with so 
the user provided keyword data and the address data of 
the keyword in all the web pages. Then, the searching 
program 66 searches the classification index file 62 
according to the keyword searching data 42 to find the 
classification data 64 of the web page of each matched 55 
keyword searching data 42. The searching program 66 
finds all keyword searching data 42 belonging to the 
classification data according to the classification data 
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provided by the user to find the text data 38 of all web 
pages which belong to the classification data and have 
the keyword. Finally, the text data file 36 Is used for 
transmitting the text data 38 of all web pages belonging 
to the classification data and having the keyword to the 
user. 

[0014] The text searching system 30 uses the clas- 
sification index file 44 to find all web pages belonging to 
the classification data provided by the user, and then 
uses the text index file 40 and the keyword provided by 
the user to find all the web pages belonging to the clas- 
sification data and having the keyword. The text search- 
ing system 60 uses the text Index file 40 to find ail web 
pages having the keyword provided by the user, and 
then uses the classification index file 62 and the classi- 
fication data provided by the user to find all the web 
pages belonging to the classification data and having 
the keyword. 

[0015] Compared with the prior art searching sys- 
tem 10, the text searching systems 30, 60 according to 
the present invention use keyword and classification 
data provided by the user and finds all the web pages 
that belong to the classification data and have the key- 
word. The text searching systems 30, 60 transmit only 
the text data of all the web pages belonging to the clas- 
sification data arid having the keyword to the user. 
Therefore, the searching and transmission time is 
greatly reduced and the text searching system is more 
efficient 

Claims 

1 . A text searching system (30, 60) comprising: 

a computer having a memory (32) for storing 
programs and data and a processor (34) for 
executing the programs stored in the memory 
(32); 

a text data file (36) stored in the memory (32) 
having text data (38) of web pages of a plurality 
of world wide web sites; and 
a text index file (40) stored in the memory (32) 
having keyword searching data (42) for search- 
ing keywords contained In the text data (38) of 
each of the web pages of the text data file (36); 
characterized In that: 

the text searching system (30,60) further com- 
prises: 

a classification index file (44,62) stored in the 
memory (32) having classification data (46,64) 
corresponding to the classification (54) of each 
of the web pages of the text data file (36); and 
a searching program (48,66) stored in the com- 
puter for searching the text index file (40) and 
the classification index file (44,62) according to 
keyword and classification data (46,64) pro- 
vided by a user so as to find text data (38) 
which are matched with the user provided key- 
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word data and contained in a plurality of target 
web pages whose classifications (54) are 
matched with the user provided classification 
data (46,64) in the text data file (36). 

5 

2. The text searching system (30) of claim 1 wherein 
the classification index file (44) contains a plurality 
of classifications (54) and web page data (50) of all 
the web pages belonging to each of the classifica- 
tions (54), and wherein the searching program (48) io 
searches the classification index file (44) to find alt 

the target web pages whose classifications (54) are 
matched the user provided classification data (46), 
and then searches the text index file (40) to find text 
data (38) which are matched with the user provided is 
keyword data and contained in the target web 
pages of the text data file (36). 

3. The text searching system (30) of claim 2 wherein 
the web page data (50) of each specific web page 20 
in the classification index file (44) contain keyword 
position indexing data (52) for pointing the positions 

of the keyword searching data (42) of the specific 
web page contained in the text index file (40), and 
wherein the searching program (48) searches the 25 
classification index file (44) to find the positions of 
the keyword searching data (42) of the target web 
pages in the text index file (40), and then searches 
the keyword searching data (42) of the target web 
pages to find the text data (38) which are matched 30 
with the user provided keyword data and contained 
in the target web pages of the text data file (36). 

4. The text searching system (60) of claim 1 wherein 

the classification index file (62) contains the classi- 35 
fication of the web page of each keyword searching 
data (42) in the text index file (40), and wherein the 
searching program (66) searches the text index file 
(40) to find all the keyword searching data (42) 
matched with the user provided keyword data, and ao 
then searches the classification index file (62) to 
find the classification of the web page of each 
matched keyword searching data (42) so as to 
locate the keyword searching data (42) of the target 
web pages, and finally finds the text data (38) con- 45 
tained in the text data file (36) using the keyword 
searching data (42) of the target web pages. 



4 



EP 1 056 024 A1 



Processor 



14 



12 



Memory 



Searching program 



-24 



18- 



Text data file 



Text data 



16 



Text index file 



Keyword searching 
data 



Fig. 1 Prior art 



EP 1 056 024 A1 



-22 



21 



23 



World 


al, a2, a3 


World wide 


bl,b2,b3 


World wide web 


cl c2, c3 











Fig. 2 Prior art 



6 



EP 1 056 024 A1 



42- 



46 

54- 
50 

52- 



Processor 



34 



•32 



Memory 



Searching program 



Text data file 



Text data 



-48 
-36 



Text index file 



Keyword searching 
data 



Classification index file 



Classification data 



Classification 



Web page data 



Keyword position 
indexing data 



Fig. 3 



EP 1 056 024 A1 




Processor 



34 



-3 



Memory 



Searching program 



-66 



Textdata file 



Te^data 



Text index file 



Keyword searching 
data 



Classification index 
file 



Classification data 



Fig. 4 



8 



EP 1 056 024 A1 




European Patent 
Offlo* 



EUROPEAN SEARCH REPORT 



EP 99 10 9330 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



uwwon ot oocuffwi win iToctgon, wii#rs ippropfM, 



FWtovant 
tocWm 



CtAMmCATION OF THE 



HO 98 09229 A (TELEVITESSE SYSTEMS INC 
; STREATCH PAUL (CA); REED JIN (CA)) 
5 Narch 1998 (1998-03-05) 

• the whole document * 

AN0NYH0US: "Taxonoalzed Ifeb Search" 
IBN TECHNICAL DISCLOSURE BULLETIN, 
vol. 40, no. 5, 1 Nay 1997 (1997-05-01), 
pages 195-196, XP002133594 
New York, US 

• the whole docuaent * 

HEARST N A ET AL: "CAT-A-C0NE: AN 
INTERACTIVE INTERFACE FOR SPECIFYING 
SEARCHED AND VIEWING RETRIEVAL RESULTS 
USING A LARGE CATEGORY HIERARCHY" 
ANNUAL INTERNATIONAL ACH-SIGIR CONFERENCE 
ON RESEARCH AND DEVELOPMENT IN INFORMATION 
RETRIEVAL , US , NEW YORK, NY: ACM, 1997, pages 
246-255, XP000782010 ISBN: 0-89791-836-3 

• page 251, col nan 1, line 7 - page 251, 
coluan 2, line 28 * 

• page 252, coluan 1, line 38 - page 253, 
coluan 1, line 42 * 



1,2,4 



1-4 



606F17/30 



1-4 



TECHMCALHBltt 



G06F 



The present sMTCh report 
THE HAGUE 



hu been drawn up ter t» ckkra 



DeAs of oofrotefcfi oi flSe eeefoh " 

21 March 2000 



Abblng, R 



category op crreoDOcukorre 

A . pwraOLMny IMVWi N UKafl mCTW 

Y J.Jl, « - M - » «- - J ^JL ■ - 

• pvwuiny rwvniiMjiiuppu mui wvitc 
doouiMnt of Vw Ml 

A. I 1r»rdi-i»l *- - * M 

Oincn -wifllwi ofcoJoturt 



▼ • til run iiiiiAnfcli i ■ I * *■ 

i . wwory or pmoBM untiniTQ i 
E : Mrf«r pgfcnl idooiTMrt, but pcM*r*d on, or 

tftir'fWMaTg dwt 
D: doouiMnlcntdln Vw appAeafcn 
L : doouMrt cUmI lor oCmn iwMns 



I p4MfS ttfTaTy, 



9 



EP 1 056 024 A1 



ANNEX TO THE EUROPEAN SEARCH REPORT 
ON EUROPEAN PATENT APPLICATION NO. 



EP 99 10 9330 



I EwopMn tMich npnL 
21-03-2000 



PMMft doountini 
cfttd ki Mtrcfi report 



- - — — - - 



W0 9809229 



05-03-1998 



CA 
AU 
EP 



2184518 A 
4007497 A 
0922260 A 



01-03-1998 
19-03-1998 
16-06-1999 



10 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



